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The present invention relates to a pipeline structure 
for use in a digital system. 

A pipeline structure consists of a sequence of 
functional units (stages) , which perform a task in several 
steps; the stages work in parallel thus giving higher 
throughput than if all the steps had to be completed before 
starting a next task. Pipelines are commonly used in several 
applications, for example, to process different parts of an 
instruction in a microprocessor. 

Typically, the pipeline has a synchronous architecture. 
A- synchronous pipeline receives a single clock signal, which 
controls all the stages. As a consequence, every stage must 
complete its work within one clock period. 

A drawback of the synchronous pipeline is that all the 
stages switch at the same time. This involves high peaks of 
power consumption (due to the current absorbed by the short- 
circuits that are formed during the switching of the 
transistors of the logic gates, and to the current needed 
for charging and discharging wires and driven capacitors) . 
These peaks of power consumption introduce sources of noise, 
which can jeopardise the functionality of a whole electronic 
device embedding the pipeline. Moreover, they impose several 
constrains in the design of a power supply structure; 
particularly, metal tracks used to supply the electronic 
device (when integrated in a chip of semiconductor material) 
must be dimensioned so as to withstand the aforementioned . 
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high peaks; as a consequence, an increased area of the chip 
is required to integrate the electronic device. 

Asynchronous pipelines have also been proposed, in 
order to reduce the peaks of power consumption. In an 
asynchronous pipeline, all the stages proceed independently 
(so that they do not switch at the same time) . A handshaking 
mechanism is then used to maintain every pair of adjacent 
stages in synchronisation. For this purpose, each stage 
generates a signal indicative of the completion of its work. 
This signal is used to move the result of the stage to a 
next stage, and then to trigger starting of the next stage. 

However, the implementation of the handshaking 
mechanism is relatively complex. Moreover, an additional 
circuit is required to synchronise the flux of input and 
output information with the outside. 

It is an object of the present invention to overcome 
the above-mentioned drawbacks. In order to achieve this 
object, a structure as set out in the first claim is 
proposed. 

Briefly, the present invention provides a pipeline 

structure for use in a digital system including a plurality 

of stages arranged in a sequence from a first stage for 

receiving an input of the pipeline structure to a last stage 

for providing an output of the pipeline structure, at least 

one intermediate stage being interposed between the first 

stage and the last stage, wherein the first stage and the 
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last stage are controlled by a main clock signal; the 
pipeline structure further includes phase shifting means for 
generating at least one local clock signal from the main 
clock signal for controlling the at least one intermediate 
stage, the main clock signal and the at least one local 
clock signal being out of phase - 

Moreover, the present invention provides a digital 
system including this pipeline structure, and an electronic 
device including the digital system; a corresponding method 
of operating a pipeline structure is also encompassed. 

Further features and the advantages of the solution 
according to the present invention will be made clear by the 
following description of a preferred embodiment thereof, 
given purely by way of a non-restrictive indication, with 
reference to the attached figures, in which: 

Figure 1 is a schematic block diagram of a hand-held 
computer wherein the pipeline structure of the invention can 
be used; 

Figure 2 illustrates the functional blocks of the 
pipeline; and 

Figure 3 is a time diagram showing operation of the 
pipeline structure . 

With reference in particular to Figure 1, a hand-held 
computer 100 is depicted. The hand-held computer 100, also 
known as palmtop, pocket computer or Personal Digital 
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Assistants (PDA) , consists of a very small system that 
literally fits in one hand^ The hand-held computer 100 is 
formed by several units, which are connected in parallel to 
a communication bus 105. In detail, a microprocessor 110 
controls operation of the hand-held computer 100, a DRAM 115 
is directly used as a working memory by the microprocessor 
110, and a Read Only Memory (ROM) 120 stores basic code for 
a bootstrap of the hand-held computer 100- 

Several peripheral units are further connected to the 
bus 105. Particularly, a non-volatile memory 125, typically 
consisting of a flash E^PROM, operates as a solid-state mass 
memory for the hand-held computer 100. Moreover, the hand- 
held computer 100 includes input devices 130 (for example, 
an electronic pen or stylus) , and output devices 135 (for 
example, a flat panel screen made with a TFT technology) . 
Interfaces 140 are used to connect external peripherals 
(such as a PCMCIA network card) to the hand-held computer 
100. 

A timing unit 145 generates a main clock signal CLKm, 
which is used to synchronise operation of the hand-held 
computer 100* A battery pack 150 provides a power supply 
voltage Vdd for all the units of the hand-held computer 100, 
so as to enable the hand-held computer 100 to run without 
plugging it in. 

The microprocessor 110 has a pipeline architecture, 

wherein a sequence of stages simultaneously processes 

4 
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different parts of every instruction to be executed by the 
microprocessor 110. Particularly, a first stage fetches the 
instruction (from the DRT^ 115) , a second stage decodes the 
instruction, a third stage fetches the respective arguments 
5 (if any) , a fourth stage executes the operations required by 
the instruction, and a fifth stage stores a possible result. 
In this way, as one instruction is executed, the next 
instruction is being decoded and the one after that is being 
fetched. For maximum performance, the pipeline requires a 
10 continuous stream of instructions; therefore, this technique 
is commonly combined with instruction prefetch in an attempt 
to keep the pipeline busy. 

Similar considerations apply if the hand-held computer 
has a different structure or includes other units (for 
15 example, an infrared port) , if the pipeline is formed by a 
different number of stages, if no prefetch is - implemented, 
if each stage performs other functions, and the like. 
Alternatively, the pipeline is used in the microprocessor of 
a laptop computer, in a mobile telephone, in a memory 
20 (wherein data is saved in a stack while next data is being 
accessed) , or more generally in any other digital system. 

Considering now Figure 2, a structure 200 of the 
pipeline used in the microprocessor of the hand-held 
computer is depicted. The pipeline 200 is formed by N=5 
25 stages STi (with i=l...N). Each stage STi includes a 
register Ri and a combinatorial circuit Ci (save for the 

5 
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last stage ST5, which only has the register R5 without any 
combinatorial circuit) . The combinatorial circuit Ci is 
cascade connected to the respective register Ri; the 
register Ri (of the first stage STi) and the register R5 (of 
the last stage ST5) define an input and an output, 
respectively, of the pipeline 200. 

An input word IN (for example, of 32 bits) received by 
the pipeline 200 is stored into the register Ri (as a word 
INi) . Each register Ri (with the exception of the last one) 
operates as an input buffer for the respective combinatorial 
circuit Ci* The combinatorial circuit Ci processes a word INi 
provided by the register Ri, and generates a result 
consisting of a word OUTi; the combinatorial circuit Ci has 
a propagation time Pi (defined as the delay for obtaining 
the word OUTi from the word INi) • The output of the 
combinatorial circuit Ci is then stored into the next 
register Ri+i (so that INi+i^OUTi) . The word stored in the 
last register R5 (OUT4) is sent to the outside as an output 
word OUT of the pipeline 200. 

Operation of the pipeline 2 00 is controlled by the main 
clock signal CLK^, Particularly, each register Ri has a 
control terminal, which is used to trigger loading of the 
word supplied to its input (word IN for the register Ri and 
word INi for the other registers R2-R5) . The first register 
Ri and the last register R5 are controlled by the main clock 
signal CLKc„ directly. The other registers R2-R4 (of the 
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intermediate stages ST2-ST4) are controlled by local clock 
signals CLK2-CLK4, respectively- The local clock signals 
CLK2-CLK4 are generated from the main clock signal CLIQn using 
a phase shifting circuit. This circuit consists of a delay 
5 block Di for each intermediate stage STi* The block Di 
generates the corresponding local clock signal CLKi applying 
a pre-set delay di to the clock signal controlling the next 
stage STi+i; in other words, the local clock signals CLK2, 
CLK3 and CLK4 are generated delaying the clock signals CLK3, 

10 CLK4 and CLKm, respectively. The delay blocks D2-D4 ensure 
that the main clock signal CLKm and every local clock signal 
CLKi are out of phase, so that the registers R1-R5 never 
switch at the same time. 

Similar considerations apply if the pipeline includes a 

15 different number of stages (down to three) , if the word 
consists of a different number of bits, if the registers are 
replaced with equivalent buffers, if a further combinatorial 
circuit is connected to the last register, if the first 
register is missing, and the like. 

20 Operation of the pipeline described above is shown in 

the simplified time diagram of Figure 3. The various signals 
are switched at the rising edge of the respective clock 
signal (CLKm# CLK2-CLK4) ; each word is represented by a band 
(the crossing points of the band define the switching 

25 times) . The input word IN is loaded into the first register 
Ri (word INi) at the time Ti (in response to the raising edge 
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of the main clock signal CLKm) . The word INi is processed by 
the combinatorial circuit Ci; the output of the 
combinatorial circuit Ci (word OUTi) is stored into the 
second register R2 (word IN2) at the next rising edge of the 
local clock signal CLK2 (time Ti+d4+d3+d2) . In a similar 
manner, the output of the combinatorial circuit C2 (word 
OUT2) is stored into the third register R3 (word IN3) at the 
next rising edge of the local clock signal CI1K3 (time 
T2+d4+d3) . The output of the combinatorial circuit C3 (word 
OUT3) is likewise stored into the fourth register R4 (word 
IN4) at the next rising edge of the local clock signal CLK4 
(time T3+d4) • The word IN4 is then processed by the 
combinatorial circuit C4/ the output of the combinatorial 
circuit C4 (word OUT4) is stored into the last register R5 
(providing the output word OUT) at the next rising edge of 
the main clock signal CLK^ (time T4) • Therefore, three clock 
periods (T4-T1) are needed to pass through the entire 
pipeline (in order to get the output word OUT corresponding 
to the input word IN) . 

Correct operation of the pipeline requires that a new 
word cannot be written into a register before the previous 
one has been used (by the next combinatorial circuit) • 
Particularly, a generic word INi is supplied to the 
combinatorial circuit Ci as soon as it is loaded into the 
corresponding register Ri. The combinatorial circuit Ci 
generates the resulting word OUTi after the respective 

8 
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propagation time Pi. In order to ensure that the 
combinatorial circuit C± has completed its work before the 
word OUTi is stored in the next register Ri+i, the difference 
between the switching times of the registers Ri+i and Ri must 
5 be greater than the propagation time Pi of the combinatorial 
circuit Ci. 

Considering in particular the first stage STi, the 
register Ri switches at every rising edge of the main clock 
signal CLKm (for example, Ti) ; the second register R2 
10 switches at the time 7i +^4 +^3 +^^2 " ^1 • Therefore, the 

following relation must be met: 

N-l 

15 Denoting with Tm the time of a generic raising edge of the 
main clock signal CLK^/ a register Ri of any intexnmediate 
Stage (from ST2 to ST4) switches at the time T^-^^dj ; the 

N-l M N-l 

next register Ri+i switches at the time T^^^ + +7^+ 

j=i+i 

(where T is the period of the main clock signal CLKm) • 
20 Therefore, the restraint applicable to every intermediate 
stage is: 

Finally, the register R4 switches at the time T3+d4 and the 
25 register R5 switches at the time T4=T3+T, so that the 
following condition must be met for the last stage: 

9 
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T3+T- (T3+d4)>P4 
T-d4>P4 

similar considerations apply if a different timing is 
envisaged for the pipeline, if the signals are strobed after 
two or more clock periods from their switching, if the 
difference between the switching times of the adjacent 
registers is greater than the clock period, and the like. 

More generally, the present invention proposes a 
pipeline structure for use in a digital system. The pipeline 
structure includes a plurality of stages arranged in a 
sequence from a first stage (for receiving an input of the 
pipeline structure) to a last stage (for providing an output 
of the pipeline structure) ; one or more intermediate stages 
are interposed between the first stage and the last stage. 
The first stage and the last stage are controlled by a main 
clock signal. In the pipeline structure of the invention, 
phase shifting means are provided for generating one or more 
local clock signals (from the main clock signal) for 
controlling the intermediate stages; the main clock signal 
and the local clock signals are out of phase. 

The proposed solution strongly reduces the peaks of 
power consumption in the pipeline structure. In this way, 
less sources of noise are introduced. Moreover, the 
constraints in the design of a power supply structure of a 
whole electronic device embedding the pipeline are relaxed; 
particularly, metal tracks used to supply the electronic 

10 
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device (when integrated in a chip of semiconductor material) 
may be smaller; as a consequence, a reduced area of the chip 
is required to integrate the electronic device- 

This result is achieved with a very simple 
5 architecture, without any handshaking mechanism among the 
stages of the pipeline. 

In addition, the pipeline of the invention maintains a 
synchronous interface with the outside (for the flux of 
input and output information) . Particularly, the proposed 
10 solution makes it possible to reduce the number of clock 
periods required to pass through the entire pipeline 
(compared with the synchronous pipeline known in the art) , 
even if different timings are not excluded* 

The preferred embodiment of the invention described 
15 above offers further advantages . 

For example, the pipeline has multiple intermediate 
stages, each one controlled by a corresponding local clock 
signal (with all the local clock signals that are out of 
phase) • 

20 This feature further reduces the peaks of power 

consumption (since all the intermediate stages switch at 

different times) . 

Preferably, each local clock signal is obtained 

delaying the clock signal controlling an adjacent stage. 
25 The proposed structure is very simple, but at the same 

time effective. 

11 
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As a further enhancement, each delay block is input the 
clock signal of a next stage. 

This solution makes it possible to ensure correct 
operation of the pipeline with shorter delays (than if the 
5 local clock signals were obtained from the previous stage) . 

Alternatively, the local clock signals are not all out 
of phase, two or more stages are controlled by the same 
local clock signal, the pipeline includes a single 
intermediate stage, each local clock signal is obtained 
10 delaying another clock signal (for example, the one 
controlling the previous stage) , or different phase shifting 
means are envisaged. 

Particularly, each intermediate stage includes a 
functional unit and a buffer; the functional unit has a 
15 propagation time lower than the phase difference between the 
corresponding clock signal and the clock signal controlling 
the next stage. 

This structure better exploits the advantageous effects 
of the present invention (at the same time ensuring correct 
20 operation of the pipeline). 

Preferably, each stage consists of a combinatorial 
circuit and a corresponding buffer (storing a word) . 

In this way, the peaks of power consumption are reduced 
to the minimum. 

25 However, the solution according to the invention leads 

itself to be implemented in a pipeline wherein each register 

12 
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consists of a stack with a depth of two or more words, or 
even in a pipeline having a different architecture (for 
exainple, consisting of a simple shift register without any 
combinatorial circuit) . 
5 Typically, the pipeline of the invention is used in a 

digital system. 

The improvement provided by the synchronous interface 
of the proposed pipeline is clearly perceived in a digital 
system of the synchronous type . 
10 Moreover, the solution according to the present 

invention is particularly advantageous in an electronic 
device that is supplied by a battery (wherein the power 
consumption is a very critical issue) . 

However, the pipeline of the invention is also suitable 
15 to be used in a different digital system (even of the 
asynchronous type) , and in any other electronic device (for 
example, supplied by mains electricity) . 

Naturally, in order to satisfy local and specific 
requirements, a person skilled in the art may apply to the 
20 solution described above many modifications and alterations 
all of which, however, are included within the scope of 
protection of the invention as defined by the following 
claims . 

25 
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phase shifting means includes a delay block (D2/ 03,04) for 
obtaining the corresponding local clock signal 
(CLK2 , CLK3 , CLK4 ) from the clock signal (CLK3 , CLK4 , CLKm) 
controlling an adjacent stage (ST3, ST4, ST5) in the sequence. 

4. The pipeline structure (200) according to claim 3, 
wherein each delay block (02,03,04) is connected to obtain 
the local clock signal (CLK2/CLK31CLK4) controlling the 
corresponding intermediate stage (ST2/ ST3, ST4) from the clock 
signal (CLK3,CLK4,CLKm) controlling a next stage (ST3 , ST4 , ST5) 
in the sequence. 

5. The pipeline structure (200) according to claim 4, 
wherein each intermediate stage (ST1-ST4) includes a 
functional unit (C1-C4) cascade connected to a buffer (Ri- 
R4) , the buffer storing an output of the functional unit* of 
a previous stage in the sequence responsive to the 
corresponding clock signal (CLK^/ CLK2-CLK4) and the 
functional unit having a propagation time lower than the 
phase difference between the corresponding clock signal and 
the clock signal (CLK2-CLK4, CLKm) controlling the next stage 
(ST2-ST5) • 

6. The pipeline structure (200) according to claim 5, 
wherein each functional unit consists of a combinatorial 

15 
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CLAIMS 

1. A pipeline structure (200) for use in a digital 
system (110) including a plurality of stages (STi) arranged 
in a sequence from a first stage (STi) for receiving an 
input of the pipeline structure to a last stage (ST5) for 
providing an output of the pipeline structure, at least one 
intermediate stage (ST2-ST4) being interposed between the 
first stage and the last stage, wherein the first stage and 
the last stage are controlled by a main clock signal (CLKm) , 

characterized in that 
the pipeline structure further includes phase shifting means 
(D2-D4) for generating at least one local clock signal (CIiK2- 
CLK4) from the main clock signal for controlling the at 
least one intermediate stage, the main clock signal and the 
at least one local clock signal being out of phase. 

2. The pipeline structure (200) according to claim 1, 
wherein the at least one intermediate stage consists of a 
plurality of intermediate stages (ST2-ST4) each one 
controlled by a corresponding local clock signal (CLK2- 
CLK4) , the local clock signals being out of phase • 

3. The pipeline structure (200) according to claim 1 or 
2, wherein for each intermediate stage (ST2, ST3, ST4) the 

14 
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circuit (C1-C4) and each buffer consists of a register (Ri- 
R4) for storing a word. 

7- A digital system (110) including the pipeline 
5 structure (200) according to any claim from 1 to 6. 

8. The digital system (110) according to claim 7, 
wherein the digital system if of the synchronous type. 

10 9. An electronic device (100) including the digital 

system (110) according to claim 7 or 8, and a battery (150) 
for supplying the digital system (110) . ^' 

10. A method of operating a pipeline structure (200) 
15 for use in a digital system (110) including a plurality of 
stages (STi) arranged in a sequence from a first stage (STi) 
for receiving an input of the pipeline structure to a last 
stage (ST5) for providing an output of the pipeline 
structure, at least one intermediate stage (ST2-ST4) being 
20 interposed between the first stage and the last stage, 
wherein the method includes the steps of : 

controlling the first stage and the last stage by means 
of a main clock signal (CLKm) , 

characterized by the steps of 
25 generating at least one local clock signal (CLK2-CLK4) 

16 
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from the main clock signal, the main clock signal and the at 
least one local clock signal being out of phase, and 

controlling the at least one intermediate stage by 
means of the at least one local clock signal. 



5 
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ABSTRACT 

A PIPELINE STRUCTURE 

A pipeline structure (200) for use in a digital system 
(110) is proposed. The pipeline structure includes a 
plurality of stages (STi) arranged in a sequence from a 
first stage (STi) for receiving an input of the pipeline 
structure to a last stage (ST5) for providing an output of 
the pipeline structure, at least one intermediate stage 
(ST2-ST4) being interposed between the first stage and the 
last stage, wherein the first stage and the last stage are 
controlled by a main clock signal (CLKm) ; the pipeline 
structure further includes phase shifting means (D2-D4) for 
generating at least one local clock signal (CLK2-CLK4) from 
the main clock signal for controlling the at least one 
intermediate stage, the main clock signal and the at least 
one local clock signal being out of phase. 

(Figure 2) 
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